Geospatial analysis provides a distinct perspective on the world, a unique lens through which to examine events, patterns, and processes that operate on or near the surface of our planet. Ultimately geospatial analysis concerns what happens where, and makes use of geographic information that links features and phenomena on the Earth’s surface to their locations.
We can talk about a few different concepts when it comes to spatial information. These are:
At the center of all spatial analysis is the concept of place. People identify with places of various sizes and shapes, from the room with the parcel of land, to the neighbourhood, to the city, the country, the state or the nation state. Plcaes often have names, and people use these to talk about and distinguesh names. Names can be official. Places also change continually as people move. The basis of rigorous and precise definition of place is a coordinate system, a set of measurements that allows place to be specified unambiguously and in a way that is meaningful to everyone.
Attribute has become the preferred term for any recorded characteristoc or property of a place. A place’s name is an obvious example of an attribute. But there can be other pices of information, such as numer of crimes in a neighbourhood, or the GDP of a country. Within GIS the term ‘attributes’ usually refers to records in a data table associated with individual features in a vector map or cells in a grid (raster or image file). These data behave exactly as data you have encountered in your data analysis courses. The rows represent observations, and the columns represent variables. The variables can be numeric or categorical, and depending on what they are, you can apply different methods to making sense of them.
In spatial analysis it is customary to refer to places as objects. These objects can be a whole country, or a road. In studies of climate change, the objects of interest might be weather stations of minimal extent, and will be represented as points. On the other hand, studies of social or economic patterns may need to consider the two-dimenstional extent of places, which will therefore be represented as areas. These representations of the world are part of what is called the vector data model: A representation of the world using points, lines, and polygons. Vector models are useful for storing data that have discrete boundaries, such as country borders, land parcels, and streets. This is made up of points, lines, and areas (polygons):
Objects can also be Raster data. Raster data is made up of pixels (or cells), and each pixel has an associated value. Simplifying slightly, a digital photograph is an example of a raster dataset where each pixel value corresponds to a particular colour. In GIS, the pixel values may represent elevation above sea level, or chemical concentrations, or rainfall etc. The key point is that all of this data is represented as a grid of (usually square) cells. You will most likely be dealing with vector data in your internships, so we will be focusing on these.
Historically maps have been the primary means to store and communicate spatial data. Objects and their attributes can be readily depicted, and the human eye can quickly discern patterns and anomalies in a well-designed map.
Map projections try to portray the surface of the earth or a portion of the earth on a flat piece of paper or computer screen. A coordinate reference system (CRS) then defines, with the help of coordinates, how the two-dimensional, projected map in your GIS is related to real places on the earth. The decision as to which map projection and coordinate reference system to use, depends on the regional extent of the area you want to work in, on the analysis you want to do and often on the availability of data.
A traditional method of representing the earth’s shape is the use of globes. When viewed at close range the earth appears to be relatively flat. However when viewed from space, we can see that the earth is relatively spherical. Maps, are representations of reality. They are designed to not only represent features, but also their shape and spatial arrangement. Each map projection has advantages and disadvantages. The best projection for a map depends on the scale of the map, and on the purposes for which it will be used. For your purposes, you just need to understand that essentially there are different ways to flatten out the earth, in order to get it into a 2-dimensional map.
The process of creating map projections can be visualised by positioning a light source inside a transparent globe on which opaque earth features are placed. Then project the feature outlines onto a two-dimensional flat piece of paper. Different ways of projecting can be produced by surrounding the globe in a cylindrical fashion, as a cone, or even as a flat surface. Each of these methods produces what is called a map projection family. Therefore, there is a family of planar projections, a family of cylindrical projections, and another called conical projections see figure_projection_families
figure_projection_families
With the help of coordinate reference systems (CRS) every place on the earth can be specified by a set of three numbers, called coordinates. In general CRS can be divided into projected coordinate reference systems (also called Cartesian or rectangular coordinate reference systems) and geographic coordinate reference systems.
The use of Geographic Coordinate Reference Systems is very common. They use degrees of latitude and longitude and sometimes also a height value to describe a location on the earth’s surface. The most popular is called WGS 84. This is the one you will most likely be using, and if you get your data in latitude and longitude, then this is the CRS you are working in. It is also possible that you will be using a projected CRS. This two-dimensional coordinate reference system is commonly defined by two axes. At right angles to each other, they form a so called XY-plane. The horizontal axis is normally labelled X, and the vertical axis is normally labelled Y. Working with data in the UK, you are most likely to be using British National Grid (BNG). The Ordnance Survey National Grid reference system is a system of geographic grid references used in Great Britain, different from using Latitude and Longitude. In this case, points will be defined by “Easting” and “Northing” rather than “Longitude” and “Latitude”. It basically divides the UK into a series of squares, and uses references to these to locate something. The most common usage is the six figure grid reference, employing three digits in each coordinate to determine a 100 m square. For example, the grid reference of the 100 m square containing the summit of Ben Nevis is NN 166 712. Grid references may also be quoted as a pair of numbers: eastings then northings in metres, measured from the southwest corner of the SV square. For example, the grid reference for Sullom Voe oil terminal in the Shetland Islands may be given as HU396753 or 439668,1175316
BNG
This will be important later on when we are linking data from different projections, or when you look at your map and you try to figure out why it might look “squished”.
We already mentioned lines that constitute objects of spatial data, such as streets, roads, railroads, etc. Networks constitute one-dimensional structures embedded in two or three dimensions. Discrete point objects may be distributed on the netowkr, representing phenomena such as landmarks, or observation points. Mathematically, a network forms a graph, and many techniques developed for graphs have application to networks. These include various ways of measuring a network’s connectivity, or of finding the shortest path between pairs of points on a network. You can have a look at the lesson on network analysis in the QGIS documentation
One of the more useful concepts in spacial analysis is density - the density of humans in a crowded city, or the density of retail stores in a shopping centre. Mathematically, the density of some kind of object is calculated by counting the number of such objects in an area, and dividing by the size of the area. To read more about this, I recommend Silverman, Bernard W. Density estimation for statistics and data analysis. Vol. 26. CRC press, 1986.
Right so hopefully this gives you a few things to think about. Be sure that you are confident to know about:
You will often need a boundary shapefile for your data analysis. Sometimes you will be given spatial data to begin with, such as a shapefile, or point coordinates with latitudes and longitudes (or eastings and northings). But other times you might have to find this yourself, and join the non-spatial data to these. This latter case is what the first part of this tutorial will demonstrate. In this case, you will have to source the spatial data yourself.
You can acquire spatial data from various sources. An example is Census Boundary Data. You can read more about that here. “Boundary data are a digitised representation of the underlying geography of the census”. Census Geography is often used in research and spatial analysis because it is divided into units based on population counts, created to form comparable units, rather than administrative boundaries such as wards or police force areas. However depending on your research question and the context for your analysis, you might be using different units. The hierarchy of the census geographies goes from Country to Local Authority to Middle Layer Super Output Area (MSOA) to Lower Layer Super Output Area (LSOA) to Output Area:
Here we will get some boundaries for Manchester. Let’s use the LSOA level. These are geographical regions designed to be more stable over time and consistent in size than existing administrative and political boundaries. LSOAs comprise, on average, 600 households that are combined on the basis of spatial proximity and homogeneity of dwelling type and tenure. Neighbourhoods are often operationalised as LSOAs.
So to get some boundary data, you can use the UK Data Service website. There is a simple Boundary Data Selector(link text: https://borders.ukdataservice.ac.uk/bds.html)
When you get to the link, you will see on the top there is some notification to help you with the boundary data selector. If you are feeling unsure at any point, feel free to click on that help to guide you.
For now, let’s focus on the selector options. Here you can choose the country you want to select shapefiles for. We select “England”. You can also choose the type of geography we want to use. Here we select “Statistical Building Block”, as discussed above. And finally you can select when you want it for. If you are working with historical data, it makes sense to find boundaries that match the timescale for your data. Here we will be dealing with contemporary data, and therefore we want to be able to use the newest available boundary data.
Once you have selected these options, click on the “Find” button. That will populate the box below:
Here you can select the boundaries we want. As discussed, we want the census lower super output areas. But again, your choice here will depend on what data you want to be mapping.
Once you’ve made your choice, click on “List Areas”. This will now populate the box below. We are here concerned with Manchester. However you can select more than one if you want boundarie for more than one area as well. Just hold down “ctrl” to select multiple areas individually, or the shift key to select everything in between.
Once you’ve made your decision click on the “Extract Boundary Data” button. You will see the following message:
You can bookmark, or just stay on the page and wait. How long you have to wait will depend on how much data you have requested to download.
When your data is read, you will see the following message:
You have to right click on the “BoundaryData.zip”, and hit Save Target as on a PC or Save Link As on a Mac:
Navigate to the folder you have created for this analysis, and save the .zip file there. Extract the file contents using whatever you like to use to unzip compressed files. You should end up with a folder called “BoundaryData”. Have a look at its contents:
So you can see immediately that there are some documentations around the usage of this shapefile, in the readme and the terms and conditions. Have a look at these as they will contain information about how you can use this map. For example, all your maps will have to mention where you got all the data from. So since you got this boundary data from the UKDS, you will have to note the following:
“Contains National Statistics data © Crown copyright and database right [year] Contains OS data © Crown copyright [and database right] (year)”
You can read more about this in the terms and conditions document.
But then you will also notice that there are 4 files with the same name “england_oac_2011”. It is important that you keep all these files in the same location as each other! They all contain different bits of information about your shapefile:
Sometimes there might be more files associated with your shapefile as well, but we will not cover them here.
There is a myth about the scientist and the messy workspace, typically illustrated with Albert Einstein:
However many of us need order to be able to work properly. An organised workspace is also prominent, as we can see with these famous work spaces:
(Galileo, Marie Curie, John Dalton, Voltaire, Alan Turing, Charles Dickens)
When working in R, you have to consider your workspace. It helps immensely to keep our code and notes organised. You will likely have a project folder, where you save your data, your graphs, your analysis outputs, etc. Here we can show how to designate a folder where R will save things such as outputs and scipts, and also where it will read data in from.
Create a folder to save your data and our outputs in. In R, this is known as a working directory. So firstly, before we begin to do any work, we should create our working directory. This is simply a folder where you will save all our data, and also where you will be reading data in from. You can create a new folder, where you will save everything for this project, or you can choose an existing folder. It’s advised that you create a folder, and also give it some name you remember, that will be meaningful. Generally try to avoid spaces and special characters in folder (and file) names. It’s not necessarily a good idea to just dump everything into ‘Desktop’ either, as you want to be able to find things later, and maybe keep things tidy.
Anyway, once you have a folder identified, you will need to know the path to this folder. That is the route that you will be using in your code to read/write files from/to the right directory. Often you will get errors, about certain things “not found” due to incorrect file paths. So it’s important that we find the correct path. I now have to tell R about the path to this folder. You can do this a few ways, here I’ll show two:
You can simply set your working directiory using the graphical user interface of R Studio:
Click on Session > Set working directory > Choose directory…
Set working directory
Then navigate to the folder you want to use, open it, and click on ‘Open’.
This will have set your working directory to that folder where you just selected
The function to use to set working directory is setwd(). Inside the brackets you need to write the path to your folder, in quotation marks. So for me this is:
setwd("/Users/reka/Desktop/course-incubator")
If I copy that into the R script and run it, then my working directory will be set to this folder, called “course-incubator” on my Desktop.
So for you to be able to use this method, you need to find the filepath to your folder. How do you find this? There are multiple ways of finding the correct path for both macs and PCs, I will give an example for a mac and one for a PC here.
On a mac you can find the path to a specific file or folder by first opening Terminal, then opening Finder and navigating to the folder or file in Finder. Once you have found it, just drag and drop the folder or file it into the Terminal window. This will print out the path to your file or folder. You can then copy this, and paste it into the setwd() function.
Find mac file path
On a PC, you can fing a path to a file or folder by navitgating to it using Windows Explorer and once there, copying the file path from the top bar. This is illustrated by the red circle below:
Find pc file path
NOTE: When you copy this file path from the PC version, you will have to change the direction of the dashes, when passing this as a text string to a variable in Python. So you will have to replace all backslash (\) with forwardslash (/).
For example: C:\Users\mesike\Desktop\dokumentumok should become C:/Users/mesike/Desktop/dokumentumok
Whichever way you choose, once you have done this you can save all data in this folder, and read them in from here. Also any outputs like plots, maps and code get saved here as well.
OK so we will be using the sf package. Install if you don’t already have.
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
On Mac and Linux a few requirements must be met to install sf. These are described in the package’s README at github.com/r-spatial/sf.
To read in your data, you will need to know the path to where you have saved it. Ideally this will be in your working directory.
Let’s create an object and assign it our shapefile’s name:
shp_name <- "england_lsoa_2011.shp"
Make sure that this is saved in your working directory, and you have set your working directory.
Now use the st_read() function to read in the shapefile:
manchester_lsoa <- st_read(shp_name)
## Reading layer `england_lsoa_2011' from data source `/Users/reka/Dropbox (The University of Manchester)/31152_60142 GIS and Crime Mapping/2018_labs/shapefiles/england_lsoa_2011.shp' using driver `ESRI Shapefile'
## Simple feature collection with 282 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 378833.2 ymin: 382620.6 xmax: 390350.2 ymax: 405357.1
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
Now you have your spatial data file. You can have a look at what sort of data it contains, the same way you would view a dataframe, with the View() function:
View(manchester_lsoa)
## Simple feature collection with 6 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 381453.3 ymin: 387925 xmax: 386933 ymax: 397654.3
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
## label name code
## 1 E08000003E02001062E01005066 Manchester 018E E01005066
## 2 E08000003E02001092E01005073 Manchester 048C E01005073
## 3 E08000003E02001062E01005061 Manchester 018A E01005061
## 4 E08000003E02001062E01005062 Manchester 018B E01005062
## 5 E08000003E02001062E01005063 Manchester 018C E01005063
## 6 E08000003E02001062E01005065 Manchester 018D E01005065
## geometry
## 1 POLYGON((384850 397432, 384...
## 2 POLYGON((382221.097 388869....
## 3 POLYGON((386745.92 397293.4...
## 4 POLYGON((384601 397195, 384...
## 5 POLYGON((386140.989 396808....
## 6 POLYGON((385738 397442, 385...
And of course, since it’s spatial data, you can finally map it:
plot(manchester_lsoa)
This is the main way that we will be creating maps. Now we will leave this at this for today.
So this is a super brief intro into some of the cool things you can do with leaflet. There are comprehensive tutorials available online, for example here.
Leaflet is the leading open-source JavaScript library for mobile-friendly interactive maps. It is very most popular, used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB.
In this section of the lab we will learn how to make really flashy looking maps using leaflet.
You will need to have installed the following packages to follow along:
install.packages("leaflet") #for mapping
install.packages("RColorBrewer") #for getting nice colours for your maps
Once you have them installed, load them up with the library() function:
To make a map, just load the leaflet library:
library(leaflet)
You then create a map with this simple bit of code:
m <- leaflet() %>%
addTiles()
And just print it:
m
You might of course want to add some content to your map.
You can add a point manually:
m <- leaflet() %>%
addTiles() %>%
addMarkers(lng=-2.230899, lat=53.464987, popup="You are here")
m
Or many points manually:
latitudes = c(53.464987, 53.472726, 53.466649)
longitudes = c(-2.230899, -2.245481, -2.243421)
popups = c("You are here", "Here is another point", "Here is another point")
df = data.frame(latitudes, longitudes, popups)
m <- leaflet(data = df) %>%
addTiles() %>%
addMarkers(lng=~longitudes, lat=~latitudes, popup=~popups)
m
You can change the background as well. You can find a list of different basemaps here.
m <- leaflet(data = df) %>%
addProviderTiles("Stamen.Toner") %>%
addMarkers(lng=~longitudes, lat=~latitudes, popup=~popups)
m
You will most likely want to add data to your map form external sources, rather than manually creating points.
For example, I illustrate here with data from Manchester Open Data about public toilets:
publicToilets <- read.csv("http://www.manchester.gov.uk/open/download/downloads/id/171/public_toilets.csv")
Often spatial data will not come with latitude/longitude format, but with easting and northing. Leaflet (as far as I know) prefers lat/long so we might have to convert from BNG to WGS84.
First thing we might notice is that the coordinates are in Easting and Northing format, rather than Latitude/ Longitude:
publicToilets[,8:9]
## GeoX GeoY
## 1 383958 398732
## 2 384245 398416
## 3 383873 398647
## 4 383643 398363
## 5 383950 398449
## 6 383194 397863
## 7 383312 398333
## 8 383923 398633
## 9 383864 398033
There is a comprehensive step-by-step tutorial on converting coordinates here. I’ll just briefly demo this here.
#the library I'm using here is rgdal
library(rgdal)
## Loading required package: sp
## rgdal: version: 1.2-8, (SVN revision 663)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
## Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
## Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
## Linking to sp version: 1.2-4
#these are the variables for the coordinate system types
bng = "+init=epsg:27700"
latlong = "+init=epsg:4326"
#create coords
coords <- cbind(Easting = as.numeric(as.character(publicToilets$GeoX)),
Northing = as.numeric(as.character(publicToilets$GeoY)))
# create a SpatialPointsDataFrame
publicToiletsSPDF <- SpatialPointsDataFrame(coords, data = publicToilets, proj4string = CRS(bng))
#reproject with spTransform
publicToiletsSPDF_latlng <- spTransform(publicToiletsSPDF, CRS(latlong))
#extract coords into a column
publicToiletsSPDF_latlng@data$lng <- publicToiletsSPDF_latlng@coords[,1]
publicToiletsSPDF_latlng@data$lat <- publicToiletsSPDF_latlng@coords[,2]
Now you should have a reprojected spatial points data frame with latitude and longitude, ready to be mapped:
m <- leaflet(data = publicToiletsSPDF_latlng@data) %>%
addProviderTiles("Stamen.Toner") %>%
addMarkers(lng=~lng, lat=~lat, popup=~LocationText)
m